March 5th, 2021

The relevant xkcd comic

If that doesn’t fix it, git.txt contains the phone number of a friend of mine who understands git. Just wait through a few minutes of ‘It’s really pretty simple, just think of branches as…’ and eventually you’ll learn the commands that will fix everything.

Quick note on jargon

Technobabble is the name of the game here, but here’s a rough guide to interpret Git-speak:

  • Git : “The free computer program you’re about to learn about”
  • GitHub/GitLab : “Two websites (remotes) where you can store repositories”
  • Repo : “short for repository, a folder that is tracked with version control”
  • Remote : “A (remote) server that houses a copy of your repo”
  • Commit : “Read: take a snapshot in time, or record present state”
  • Branch / Fork : “Read: creating a parallel universe”

Agenda

  1. What is version control?
  2. Using Git locally (a crash-course in Git)
  3. Using Git with all your friends (remotes and how to talk to them)

Part 0. What is version control

and what is “Git”

How many of you know the pain?

Is this version control?

Is this good version control?

Motivating problems

Why are ad-hoc approaches often insufficient?

  • Lack of standards leave collaborators confused
  • When things go bad, they get really bad
  • Project “state” is ambiguous

So what is Git?

Okay, but what is it

Git is a tool for tracking the state of your project

  • Maintain a detailed revision history
  • View tracked files at any prior state
  • Create separate “parallel universes” seamlessly
  • Merge changes in a sensible and predictable way

How to Git it

Gitting set up

Git for Windows

  • Required regardless of what interface you use
  • Command line daunting for the casual user
  • Default commit message editor is Vim (which is really daunting to the casual user)

Gitting set up

Github Desktop

  • Serves as a “point-and-click” interface for Git operations
  • Easily manage a large number of repositories from one app
  • I will be using Github Desktop to demonstrate a GUI interface

Part 1. Local Git commands

Let’s git going

Local Git commands

Primary objectives:

  • Start a new repository
  • Create and add files to it
  • Create and merge branches
  • Checkout previous revisions / other branches

Local Git commands

I’ll present each Git operation with it’s command line name.

Focus on the association of the name to its respective operation, but don’t worry about memorizing them!

(We will not be using command line here)

Dataflow visualized

Create a repository

git init

  • git init creates an empty repository to begin tracking your files
  • Ideally you would create a repository at the start, but you can do so at any time
  • Not every file present in your project folder has to be tracked in the repo

View file status

git status

git status displays the current state of files in your project

With Git there are four “states” that your files can reside in:

  • Untracked
  • Unmodified
  • Modified
  • Staged

Tracking your files

git add

To add a file to your repository, we use the git add command.

  • We can also use this command to stage changes of modified files
  • Files added are “staged”, but we haven’t fully “saved” or “committed”

Recording your changes

git commit

git commit creates a record of all tracked files that you’ve staged.

Produces a message with three core parts:

  • Commit message (“Made implicitly missing data explicit”)
  • Commit author (me)
  • SHA1 hash (unique to every commit, used to reference)

Removing a file from the repo

git rm

git rm removes a file both from disk and from the repo

  • Base operation the same as deleting file and then staging the changes
  • Removes the files from the current index (and all future commits)

Be mindful of what you track!

Be mindful of what you track!

With very few exceptions, everything tracked within Git can be recovered.

  • In a “parallel universe”, you never deleted that data file
  • Begin thinking about this before you make a commit
  • Be especially careful of PII, passwords, private keys, etc.

“ignoring” files

or: practicing mindfulness

With a .gitignore file in your repo, you can specify explicitly what files or folders you want to remain un-tracked.

  • This is simply a file in your root folder
  • Templates are available online for many languages and project types

Seeing it in action

Demonstration

Let’s see how all of these commands work for a simple task:

  • Initialize a new repository in GitHub Desktop
  • Create and commit a simple script from RStudio
  • Create an commit a .gitignore file
  • Remove an unwanted file from the repository

Enter: Branching

Changes don’t have to be linear!

A branch allows you to diverge from the main track while keeping the state of the main track intact.

  • A branch inherits its history from its parent
  • Easily merge back in changes you made in the branch (even if the parent continues to develop independently)

Creating a branch

git branch

git branch creates a new branch off of the current one

  • If you’re just starting out, the current branch is likely master or main
  • Visualize it yourself

Switching branches

git checkout

git checkout is used to switch between branches and a few more things:

  • The same function is used to “checkout” a previous commit
  • Also used to “un-modify” a file (ie. modified -> unmodified)
    • git checkout -- <file>
    • This effectively erases any unstaged progress!

Merging branches

git merge

git merge is used to merge a distant branch into the current one.

  • Merging does not automatically delete the branch being merged in
    • The history of the distant branch is also added to the current branch
  • Most of the merges locally will be “fast-forward” merges

Let’s take a look

Working with branches

Looking back at the same repo we created earlier, let’s try out these new commands:

  • Create a new branch and check it out
  • Make changes in our new branch and merge them into master
  • Checkout a past commit

Where merges git dicey

If you (or your team) develops on more than one branch concurrently, you may run into merge conflicts.

  • If the same line is edited across branches, Git will not know what one takes precedence in a merge
  • Several merge strategies exist to automate this
  • For small merges it’s often easier to workup manually

Fixing a merge conflict

Git will provide indicators in any files containing merge conflicts indicating what lines deviate

<<<<<<< <current_branch>
I'm not a cat
=======
I'm not a lawyer
>>>>>>> <branch_being_merged>
  • Fix the document as you want it to appear and stage the changes
  • Sometimes it’s not an A/B problem and will require revision

Let’s demonstrate…

Part 2. Using Git with all your friends

Almost there!

Starting a new project on GitLab

When using GitLab or GitHub, you can start your repository directly from the website

  • You can also create a blank repository and push a repo you’ve already created locally
  • Projects can be housed under either a user or group namespace
  • Optionally set visibility (just how collaborative do you want to be?)

Let’s take a look…

Interacting with remotes

Now that we have a remote repository set up, we need to introduce just three more git commands to interact with that repository:

  • git clone
  • git pull
  • git push

Recall our Dataflow

Cloning an existing repository

git clone

git clone creates a local copy of a remote repository

  • You Only Clone Once (YOCO)
  • By default, the directory will be named after the title of the repository
  • It is not possible to “clone” bits and pieces of a larger repository

Gitting changes from remote

git pull

git pull returns any updates from a remote repository and merges them locally

  • Notice that I said it merges those changes (be wary)
  • fetch is safer because it does not attempt to merge automatically
  • When in doubt, just look at the changes online

Pushing your changes

git push

Finally, we send our updates to the remote repository using git push

  • Double check that everything is in order before you push
  • It’s best practice to always pull before commit and push

Demonstration

Takeaways

Takeaways

There are many workable models for how best to use these commands

Takeaways

My advice:

  • Commit often, grouping changes logically
  • Branch earlier rather than later
  • Make your commit messages descriptive

There’s a lot more here to talk about

Stashing, blaming, diffing, cherry-picking, rebasing

We will stop here for the sake of time and to avoid overload

Getting help

I hope you’ll continue to explore!

  • These slides (and the Rmd source) are available on GitHub
  • Ready for a deeper dive? Check out the book

Thanks!